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ABSTRACT 


The present work deals with parallel algorithms for 
problems related to generation of combinatorial objects on 
the SIMD (SM-CREW) computer. Parallel algorithms for the 
problems of generation of a Random Sample, generation of 
Binary trees, and generation of Random Rooted Unlabeled tree 
have been presented. In addition, sequential algorithms for 
generating a Random Binary tree have been proposed. 

The algorithms for generating a random sample of k out 

of n items achieve optimal speedup ratio. They run in 

O(log k) time using 0(k) processors and 0(k) memory. The 

random sampling algorithm has been used to generate a random 

2 

permutation of n objects in O((log n) ) time using 0(n) pro- 
cessors. 

The random unlabeled rooted tree is found In O(log k) 
time using 0(k) processors. 

Two algorithms have been presented for lexicographic 
generation of binary trees with'n nodes. The first algorithm 
takes 0(BT(n>) time with 0(n) processors while the second 
takes 0( fBT(n)/P]*h)time, where P is the number of processors 
and BT(n) is the total number of binary trees with n nodes. 

The sequential algorithms presented for generating a 
random binary tree take optimal 0(n) time using 0(n) space. 
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CHAPTER 1 


INTRODUCTION 

1.1 PARALLEL COMPUTING A COMBINATORIAL ALGORITHMS 

There is a fundamental limit imposed to the computing 

power of a sequential computer due to what is now famous as 

the ’speed of light’ argument. The speed of light in vaccum 
B 7 

is 3 x 10 m/s and 3 x 10 m/s in silicon. This imposes a 
limit to the number of floating point operations per second 
which can be done on a sequential machine. As the size of 
the problems increases, the power of whatever sequential 
machine is available is exhausted and hence power beyond the 
ability of a single processor machine is required. This 
makes parallel processing inevitable for large scale prob- 
lems. Also, in a real-time environment parallel processing 
is required to reduce the turn-around time which is critical 
to the performance of the system. 

Several commercial parallel systems are available now 
like INTEL’S iPSC, Thinking Machines’ Connection Machine, 
FPS hypercube supercomputer, and the BBN Butterfly system. 
However, what we still do not know is how to program these 
systems efficiently. We are good at understanding sequential 
algorithms, but we have little experience with algorithms 
for parallel processors . Though the area of Parallel Algo - 
rltbps > has generated considerable . research activity, a lot 
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more needs to be done before we can really understand the 
mechanics of algorithm design for parallel machines. The 
present work, which presents some parallel combinatorial 
algorithms, is a small step in that direction. 

The field of combinatorial algor ithms deals with the 
problems of performing computations on discrete, finite 
mathematical structures. The practical importance of compu- 
tations which are combinatorial in nature has contributed to 
the considerable research activity in this area. Sorting, 
searching, generation of combinatorial objects, and graph 
algorithms are some of the extensively studied areas of com- 
binatorial computing [RND78] . 

Following FLYNN’s classification scheme [FL66] , paral- 
lel computers are broadly classified into SIMD (Single 
Instruction Stream Multiple Data Stream) computers and MIMD 
(multiple instruction stream Multiple Data Stream) comput- 
ers. In SIMD computers, a stream of instructions is executed 
by a number of synchronised processors, each operating upon 
its own memory. In MIMD computers each processor has an 
independent instruction counter and operates in a speed 
independent manner. Most of the research in Parallel Algo- 
rithms has been done for the Slip model. For some major 
results in the area of parallel algorithms see 
[KP81], CAS85], [QD84]. 

A good deal, of work has been reported on parallel com- 
binatorial algorithms, on the Shared memory model of the 
SIMD oomputer (see section 1.3). Several algorithms for 



parallel generation of combinatorial objects have been pro- 
posed. MOR & FRAENKEL [MF82]. CHEN & CHERN [CC86] and 60PTA 
& BHATTACHARJEE [GB82] , [GB83] have dealt with the problem of 
parallel generation of Permutations. Algorithms for parallel 
generation of Combinations have been proposed by GUPTA & 
BHATTACHARJEE [GB81] , CHEN & CHERN [CC86] and CHAN & AKL 
[CA86] . 

1.2 MOBIL OF COMPUTATION 

A shared memory model of Single Instruction-stream Mul- 
tiple Data-stream (SIMD) computer is used as the model of 
computation for the parallel algorithms described in the 
subsequent chapters of this thesis. This model has been used 
by many authors for a wide variety of problems. 

In this model there are a master processor and a number 
of slave processors. It is assumed that the slave processors 
have a limited amount of local 'memory and there is an 
unbounded global memory which these slave processors can 
access. The master processor broadcasts instructions to the 
slave processors. The slave processors in active or enable 
mode execute the instruction broadcast by the master proces- 
sor at the same time. It is further assumed that all the 
processors are synchronised in the sense that if a set of 
instructions is executed in parallel, then each must be 
allowed to finish before the next set of instruction is 
started. 

In this model no two slave processors are allowed to 
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write on the same location of the global memory, however, 
two or more slave processors can read from the same location 
of the global memory. This model is known as a Parallel 
Random Access Machine (PRAM) which allows Concurrent Read 
and Exclusive Write (CREW) . 

For measuring time complexity of an algorithm designed 
for this model, it is assumed that the cost of one execution 
of an instruction broadcast by the master processor is one 
unit time, irrespective of the number of slave processors 
that may be active. The requirement of the processors is 
determined on the basis of the maximum number of the slave 
processors active at an instant during the execution of the 
entire algorithm. 

1.3 LANGUAGE CONSTRUCTS 

The following constructs have been used to present the 
algorithms in this thesis: 

The parallel loop involving n processors is represented 
by the following statement : 

for i := p to p+n-1 pardo 

steps to be executed 
In parallel; 

Here p is the staring index of the active processors . Siml- 

’ , i 

larly, the statements to be executed in parallel, oan be 
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denoted by: 

parbegin 

statements to be executed 
in parallel 

pa rend } 

Similarly, sequential begin and sequential end are 
represented by begin and end respectively. 

An algorithm is presented in the form of a procedure 
with appropriate declarations. The prefix par in a pro- 
cedure name indicates that it refers to a parallel algo- 
rithm, otherwise it is a sequential algorithm. Besides 
input and output parameters, the argument list of parallel 
procedures includes the starting index of the set of proces- 
sors and the total number of active processors. A typical 
example of a procedure statement is provided below. 

proc par_name (p,q; input; output); 

where; p : starting index of the set of* 

active processors, 
q : total number of active 
processors 

input : list of input variables 
separated by commas . 
output : list of output variables 


separated by commas . 



Let us consider some examples of parallel procedures to 
illustrate the above constructs. 

The first example considers the following problem: 
Given n integers in an array A[l-n] generate the following n 
sums: 

L 

SUMACi] := Z. A[ j] , 1 < i < n. 

The algorithm proceeds in flognl steps as follows: 


proc par_f lnd_par sum ( p , n ; A , n ; SUMA ) ; 
var 

/* input */ 
n : integer; 

A : array [l..n] of integer; 

/* output */ 

SOMA : array [l..n] of integer; 

/* local */ 
i,u : Integer; 

FAR : array [1. .n] of l..n; 

begin 

XI: /* initialize */ 

for i:= p to p+n-1 pardo 
begin 

u := i-p+1; 

FAR[u] := u+1; 

SUMA[u3 : = A[u] 

end 

dopar; 

FAR[n] : = n; 

12: /* Do folding operations on the FAR pointer for 
flog n| steps. */ 
for flog nl steps do 

for i : = p to p+n-1 pardo 
begin 

u := i-p+1; 

if u <> FAR[u) then 

SOMACFARCU33 := SUMA£FAR[u33 + SDMA£u3; 
FAR£u3 := FAR£FAR[u33 
end 
dopar 
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LEMMA 1.1 

The procedure parjf ind„parsum runs in O(log n) time 
with 0(n) processors. 

proof: see Tef- {Asssj [ 3 

The next example addresses the following problem- Given 
a set of k integers in the range 1 to n (with some integers 
having repeated occurences) in an array SAMPLE[1 : k] , find 
the set of distinct integers present in SAMPLE[l:k], If 
’items’ is the number of distinct integers, then 
SAMPLEC1 : items] stores the output. The algorithm proceeds by 
replacing all repeated occurences of any integer by n+1 
(which is outside the range of the integers) and then sort- 
ing the array SAMPLE. All the distinct integers are thus 
collected to one end of the array. The procedure 
par_eo®press presents the parallel implementation of the 
algorithm, 

proc par_compress 

( p»k ; SAMPLE, k,n ; SAMPLE, items ); 

var 

/* input */ 

SAMPLE : array [1. ,k] of l..n; 

/* output */ 

SAMPLE ; array [1.. items] of l..n; 
items : integer ; 

/* local */ 
i,u : integer 
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begin 

II: /* sort the array SAMPLE */ 
par_sort ( SAMPLE [ 1 : k ] ) ; 

12: /* set the duplicate items in SAMPLE to n+1 . Note 
that the range of items in SAMPLE is 1 . . n */ 
for i := p to p+k-2 pardo 
begin 

u := i-p+1; 

if SAMPLECu] = SAMPLE [u+1 3 then SAMPLE[u+l] := n+1 

end 
dopar ; 

13: /* again sort SAMPLE */ 
par_sort ( SAMPLE [ 1 : k] ) ; 

14: /* find the value of the last index ’items’ */ 
for i : = p to p+k-2 pardo 
begin 

u := i-p+1; 

if (SAMPLECu] <> n+1) and (SAMPLE[u+l] = n+1) then 
items : = u 

end 

dopar 

end; 


Here the procedure par_sort is the parallel sorting 
procedure [C086]. 

LEMMA 1.2 The procedure par_compress runs in O(log k) time 
with 0(k) processors. 


1.4 AN OVERVIEW OF THE THESIS 

In the present work parallel algorithms have been 
presented for some problems in generation of combinatorial 
objects on the Shared memory model of the SIMD computer. 

Chapter 2 presents two parallel algorithms for generat- 
ing a random sample of k out of n objects in 0( log k) time 
using 0(k) processors and 0(k) memory. A random permutation 
of a objects is then generated in 0( (logn) 2 ) time using 0(n) 
processors'. ; 
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Chapter 3 presents a parallel algorithm for generating 
a random unlabeled rooted tree with n nodes and k levels 
in O(log k) time using 0(n) processors. Also presented are 
two se<juential algorithms for the problem of generating a 
random binary tree with n nodes in 0(n) time using 0(n) 
space . 

In chapter 4, parallel algorithms have been developed 
for generation of all binary trees with n nodes, lexico- 
graphically. If BT(n) is the total number of such trees, the 
first algorithm takes 0{BT(n)) time with 0(n) processors. 
The second algorithm takes 0( [BT(n)/Pl . n) time, where P is 
the number of processors used. 



CHAPTER 2 




RATION OF RANDOM SAMPLE 


2.1. INTRODUCTION 

The problem of randomly selecting k items out of n dis- 
tinct items, without replacement, arises in various applica- 
tions like Quality Control, Market surveys, Simulation Stu- 
dies etc. Without loss of generality assume that the n items 
are indexed 1 through n. So the above problem of drawing a 
random sample of size k out of n items is reduced to the 
problem of generating the indices of k items at random, 
where each index is a positive integer not greater than n. 

The above problem is rather simple to solve if one 
allows either the memory storage used or the time complexity 
(or the number of processors used in the case of parallel 
computation) to be a function of n [NW753. But in most prac- 
tical situations where random sampling may be used, n is 
very large as compared to k. For example, consider the prob- 
lem of selecting 3 distinct integers from the set 
{1 , . . , 10000} , in which case we would like the time, space or 
processor complexities of the algorithm to depend on '3' 
rather than on ‘ 10000’ , 

Several algorithms exist for the problem of random sam- 
pling for sequential computers. These algorithms can broadly 
be classified into two categories. In the first category are 
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in a linear order, i.e. in the increasing order of their 
indices, on-line [VI84] , [AD85] . 

The general form of algorithms in this category is as 
follows: 

stepl . Generate a random integer d. 

step2.Skip the next d items and select the following 
one for the sample 

N := N - d - 1 

* 

k := k - 1 

if k > 0 go to stepl . 

All the algorithms differ mainly in how the skip dis- 
tance *d* is generated in stepl. It may follow a uniform, 
geometric, or some other distribution. The best known algo- 
rithm in this category runs in 0(k) time on the average ,but 
only 0(n) time in the worst case, using 0(k) space. A com- 
parison of the major algorithms in this category has been 
given in TABLE 1. 


NAME of 
ALGORITHM 

REFERENCE 

AVERAGE TIME 
COMPLEXITY 

WORST CASE TIME 
COMPLEXITY 

SPACE 

COMPLEXITY 

S 

[FMR623 , 
[J0623 

0(n) 

0(n) 

' 0(k) 

A 

[VI843 

0(n) 

0(n) 

0(k) 

D 

CVI843 

0(k) 

0(n) 

0(k) 

SG, 

SH* 

CAD853 

0(k) 

0(n) 

0(k) 

SG 






TABLE 1. 
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The algorithms in the second category generate the sam- 
ple in an arbitrary order. The sample may or may not be gen- 
erated on-line. An algorithm in this category is the algo- 
rithm select due to GOODMAN and HEDETNIEMI [GH77] which 
takes 0{n) time and 0(n) space. Some of the more efficient 
algorithms in this category are discussed in section 2.3. 

The best known sequential algorithm for random sampling 
which uses optimal 0(k) memory [GUP] runs in 0{k.log k) time 
in the worst case. Thus it is interesting to design a paral- 
lel algorithm whose product of time and processor complexi- 
ties is 0(k.log k) and which uses 0{k) space. 

In this chapter, two parallel algorithms 
(par_rand<MOL_sample and par_random_sample2 ) for the prob- 
lem of ’Random Sampling’ are presented. The algorithms run 
on the Parallel Random Access Machine (PRAM) which allows 
for Concurrent Read and Exclusive Write (CREW) in a shared 
memory environment. Both the algorithms take 0{log k) time 
using 0(k) processors and 0(k) memory. Further, the random 

sampling algorithm has been used to get a random permutation 

2 

of n items in O((log n) ) time using 0(n) processors. 

Section 2 . 2 and 2 . 3 present the algorithms 

par_randonL.saa*ple and par_random_saiaple2 while the random 
permutation algorithm is presented in section 2.4. 
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2.2. AN ALGORITHM FOR PARALLEL RANDOM SAMPLING 


The algorithm par_random_sample presented in this 
section generates a random sample of k out of n distinct 

items without replacement. The algorithm considers the case 

2 2 
when n < k separately from the case when n > k . 

CASE 1 n < k 2 

In this case the problem of selecting the k items can 

be redefined as follows: Select k^ items from the first n/2 

items and k items from the last n/2, where k, is chosen at 
r l 

random in the interval [0,k], and k^, + k r = k. Thus the k 
items can be selected recursively. The recursion ends when 
the number of items to be selected equals the number of 
items present in the corresponding group of items, or when 
the number of items to be selected equals 1. In case n is 

odd at any stage, one item (with index ’temp’) which is 

selected randomly is rejected before starting the above pro- 
cedure. This step is implemented by, first selecting the 
sample out of the first n-1 items, and later, incrementing 
all those items which are greater than or equal to ’temp’ by 
one. The procedure par_r samp_case 1 implements case 1 of 

the algorithm. The function RAND[l,nI computes a random 

integer in the range 1 through n. 
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proc par_r samp_case 1 ( p , k ; n , k ; SAMPLE ) , 

var 

/* input */ 
k,n : integer; 

/* output */ 

SAMPLE : array [l..k] of l..n, 

/* local */ 

i, temp, flag •- integer 

begin 

if k = n then 

for i := p to p+k-1 pardo 
begin 

u := i-p+1; 

SAMPLECi] := i 

end 

dopar 

else 

if Is = 1 then 

SAMPLECI] := RAND [ 1 , n] 
else 

begin 

if n is odd then 
begin 

temp : = RAND[ 1 , n 3 ; 
n : = n- 1 ; 
flag := 1 
end ; 
begin 

K1 := RAND [0,min<k,n/2) 3 ; 

Kr := k - Kl; 
parbegin 

if K1 > 0 then 

par_rsamp_caael ( p , K1 ; n/2 , K1 ; SAMPLE 1 1 : K1 3 ) ; 
if Kr > 0 then 

par_rsamp_casel ( p , Kr ; n/2 , Kr ; SAMPLE [K1 +1 , Kr 
parend 

end 

if flag = 1 then 

for i : = p to p+k-1 pardo 
begin 

u := i-p+1; 

if SAMPLE [u] > temp then 
SAMPLE [u] := SAMPLE [u] +1 

end 

dopar 

end 

end; 

LEMMA 2.1 

2 

Jot n < k the procedure par_rsamp_casel computes a 
random sample of size k out of n in O(log k) time using 0{k) 
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processors and 0(k) memory. 
proof t 

The time complexity of the algorithm is given by 
T(n,k) := T(n/2 , k) + 0(1) 
or, T(n,k) := 0(log n) 
o 

For k £ n < k , G(log n) = 0(!og k). Since at any level 

of recursion we are selecting at most k items, the processor 

complexity of the algorithm is 0(k). By symmetry, it is 

obvious that the sample selected by the above procedure is 

random. (At every stage k is broken down at random into k^ 

and k r , and thus the probability of generating the partition 

(k,,k ) is the same as that for the partition (k , k.)«) The 
x r r 1 

memory requirements of the algorithm are in the form of the 
global array SAMFLE[l:k] and the temporary variables 
’ temp* ,’ flag’ which take 0(1) memory for each recursive call 
to the procedure. There may be at most 0(k) recursive calls 
and hence, the total memory requirement is 0(k). 

[3 

CASE 2 n > k 2 

In case n > k 2 , the n items are subdivided into fh/kl 
groups, each of size k (except one). Instead of directly 
choosing a sample of k items, a group number lying in the 
interval [l,fh/kF}3 is generated at random for each of the k 
items to be selected for the sample. Let the group number i 
be generated by k^ random variates implying that k^ items 

a. 

are selected from the i group. The following otservations 
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can be made: 

1. Slk^ : = k 

2. The number of groups selected are at most k (since 
only k group numbers are selected ). 

Case 2 is further subdivided into Case 2.1 and Case 2.2 
depending upon whether n is a multiple of k or not. 

CASE 2.1 n is a multiple of k ( n = fn/lTl.k ) 

In this case it is ensured that the size of all groups 
is equal to k. Since only k group numbers are generated, in 
the worst case all the group numbers generated are equal to 
some group i. Thus = k , whereas k^ = 0 , joi. Hence the 
LEMMA 2.2 follows: 

LEMMA 2.2 

For any group i, the number of times it is selected 
(k i ) does not exceed the the number of items ! preseot in it 
(= k) ,when n = fn/kl.k . 

□ 

LEMMA 2.3 

If Th/kT > k then there exists a contiguous block of at 
least k-1 items from which no item is selected for the sam- 
ple (Assuming circularity of items i.e. the item number 1 
follows its® number n) . 

proofs 

2 

Since we have assumed that fh/fcl > k ox n > k , in the 
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worst, case, all the items selected for the sample are evenly 
distributed over the population. The minimum value of n is 

9 +> V, 

k + 1, in which case we select every k item and thus, 

between any two items selected, there exists a gap of at 
least k-1 . 

H 

Knowing k^ for each j with k^ <> 0, we can break up the 

original problem into subproblems for each such group j , 

i.e. at the next s ta B e of recursion we select kj items from 

group number j (having k items). The recursion ends when k. 

*} 

becomes greater than square root of the size of group J, in 
which case the algorithm par_rsan»p_casel is invoked. 

CASE 2.2 n is not a multiple of k ( n <> .h ) 

In this case, if the n items are subdivided into fh/kl 
groups, one group will have (n - Iji/kJ.k) items (at most k~ 
1) while the remaining ln/kj groups have k items each. How- 
ever, from LEMMA 2.3, we know that there exists a contiguous 
block of size at least k-1 from which no item is selected 
for the sample. So, a random variate ’breakpt’ in the inter- 
val [l,n] is generated, and [n-Lh/kj.k] items starting from 
that point onwards are rejected to start with. Now, the 
total number of items available is a multiple of k ,and thus 
the case is same as case 2.1. The same technique can be 
applied at subsequent steps of recursion. The rejection of 
the Cn- lh/kj . kj items can be done by, first storing the 
pointer 'breakpt* , then selecting the sample from the first 
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{.n/kj .k items, and later modifying those sample items which 
are greater than or equal to ’breakpf . The modification 
can simply be done by adding [n~Lh/kj .k] to each of affected 
items. If ’breakpf is greater than tp/jy.k, based on the 
assumption of circularity of items, all the sample items are 
modif ied . 

Based on the above discussion we have the following 

major steps involved in selecting k out of n items when n > 

2 

k (as presented in procedure par_rsamp_case2 ) : 

1. After dividing n into fh/kl groups and generating k 
group numbers randomly, finding the group numbers 
selected (GR0UP[1:T]) and the number of times each one 
is selected { KI C 1 : T] > . (procedure par_f indki ) . 

2 . After solving the random sampling problem for each 

of the groups selected, modifying the sample generated 
to take care of case 2.2 ( procedure 

par_jttodify_sample ) . 

proc par_f indki ( p,k ; n,k, GROUP ; GROUP, KI,T ); 
var 

/* input */ 
n,k : integer; 

/* output */ 

T : integer; 

GROUP : array [1. ,T] of l..fn/kl; 

KI : array [1. .T) of 1. .k; 

/* local */ 

STAGE ; array [l..k] of 1. .k; 
u, i : integer 
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begin 

II: par_SORT { GROUP [1 :k] ) ; 

12: for i : = p to p+k-2 pardo 
begin 

u := i-p+1; 

if GROUP [u] = GROUP [u+1] then 
GROUP[u+l] := n+1; 

STAGE [u] := u 

end 
dopar ; 

13: Sort the set A = { ( GROUP [ i ] , STAGE [ i] ) ! i:.= l,..,k) 
such that for i<j, GROUP [i] < GROUP [j]; 

14: for i:= p to p+k-1 pardo 
begin 

u : = i-p+1; 

if (GROUP[u] <> n+1) and (GROUPfu+1] = n+1) then 
T := i 

end 
dopar ; 

15: for i:= p to p+T-1 pardo 
begin 

u := i-p+1; 

KI[u] := STAGE [u+1] - STAGE [u] 

end 

dopar 

end; 

LEMMA 2.4 

The procedure par_findki is correct. 
proof t 


The array GROUP[l:k] which is the input to this pro- 
cedure^ stores the generated group numbers. Step II sorts th< 
array GROUP [ 1 : k] . Steps 12-15 find, for each group 

selected, the number of times it is selected. This is don< 
by manipulating a set A = {<GROUF[i] , STAGE[i] ) } i: =1 , . .k} . h 
step 12 STAGECi] is initialized to i and every repeated 
occurence of any group number in GR0UP[1: k] is replaced b; 
n+1 (n+1 is not a valid group number) . Let T be the numbe 

of distinct group numbers. The steps 13,14 collect all th 
distinct group numbers into GROUP [ 1 : T ] . Step 15 computes th 
number of times each group is selected. If KI[u] is th 
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number of times GROUP[u] is selected, it can easily be 
obtained by taking the difference of STAGE[u+l] and 
STAGE[u], where STAGE[u] gives the starting index of 
GROUP [u] for u: =1 , . . T. 

I 


□ 


proc par_modify_sample (p,T; k , n , T , breakpt , SAMPLE ; SAM- 
PLE); 
var 

/* input */ 

k,n,T, breakpt : integer; 

SAMPLE : array [l..k] of l..n; 

/* output */ 

SAMPLE : array [l..k] of l..n; 

/* local */ 

i »u,v, ml , m2 , st , It , rt, asize, incr, remainder : integer; 
begin 

II: asize := £n/kj . k ; 

12: remainder :s n-asize; 

13: if breakpt < asize then 

for i := p to p+k-1 pardo 
begin 

u := i-p+1; 

if SAMPLE [u] > breakpt then 

SAMPLE[u] := SAMPLE [u] + remainder 

end 

dopar 

else 

begin 

incr : = breakpt - asize + 1; 
for i := 1 to k pardo 
begin 

u : - i -p+l ; 

SAMPLE[u] := SAMPLE [u] + incr 

end 

dopar 

end 

end; 


The procedure par_rsamp_case2 which presents the 
algorithm for case 2 calls the procedure par_f ind_par sum 

of chapter 1 to compute, for Ki<T, SUMKICi) = ZKIU3. 

J-i 
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proc par_rsa»p_case2 ( p,k; k,n,s»e ; SAMPLE ); 
var 

/* input */ 
k,n,s,e : integer; 

/* output */ 

SAMPLE : array [l..k] of l..n; 

/* local */ 

KI , SUMKI : array [l..k] of l..k; 

GROUP : array Cl. .k] of 1 . . fn/kl 

begin ^ 

II: if fn/kl.k Ok then breakpt := RANDCl>n3 
12: for i := p to p+k-1 pardo 

GROUPCi-p+13 := RANDC 1 , jn/k J 3 
dopar; 

13: par_f indki C p,k ; n,k, GROUP ; GROUP, KI,T ); 

14: par„f i nd_par sum ( p,T ; KI,T ; SUMKI ); 

15: for i := 1 to T pardo 
begin 

par_random_sample ( p+SUMKI[i-13 ,KI£i3 ; KI[i3,k, 
SUMKI [i-l]+l, SUMKICi] ; SAMPLE ); 
for j := p+SUMKI[i-13 to p+KI£i]-l pardo 

SAMPLE [j~p] := SAMPLE[j-p3 + (GROUPCij-l) * k 
dopar 

end 

dopa r ; ^ if (JXk 1’ k. 

16: par_mod i f y_sampl e (p,k ; k , n , breakpt , SAMPLE ; SAMPLE) 
end; 

The procedure par_^andoBL_sainple presents the algo- 
rithm for selecting a random sample SAMPLEl[l:k] of size k 
out of n distinct items. It calls the procedures 
par_r samp_caae 1 , par_rsaap_case2 , depending on whether n < 
k 2 or n > k 2 . 


proc par_r andoni_sa*Hpl e ( p,k; k,n,s,e ; SAMPLE1 ); 
var 

/* input */ 
k,n,s,e : integer; 

/#. output */ 

SAMPLE 1 : array [1. .k} of 1..N; 

/* local */ . 

SAMPLE : array Cl- -k] of l.,n; 
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begin 

if fn/fcl < k then 

par_rsamp_casel (p,k ; k,n ; SAMPLED; 
else 

begin 

- par_rsamp_case2 (p,k; k,n,s,e; SAMPLE); 
for i := p to p+e-s pardo 

SAMPLE 1 [i-P+s] := SAMPLE [i-p+1] 
dopar 

end 

end; 

LEMMA 2.5 

The sample of k out of n items produced by algorithm 
par_random_saaap 1 e is uniformly random. 
proofs 

If fh/kl < k then by LEMMA 2.1 the sample is random. 
Therefore, the only case that needs to be considered is when 
Fh/kl > k at each stage of recursion. 


If, at every stage, the number of items being selected 
from any group is a factor of the size of the group i.e. n 
= fh/kl. k, then by symmetry, any sample generated follows a 
uniform distribution. If at any stage the number of items 
to be selected is not a factor of the size of the group, a 
contiguous block of (n-fn/kj.k) items is rejected. Since,' 
one such block out of the possible n is chosen at random 
(assuming circularity among the items i.e. item 1 follows 
item n) , we are assured of a uniform distribution. 

n 


ugam/n 2.6 

The algorithm par_random_sample for selecting a sam- 
ple of -k 4t«ms out of a at random runs in O(log k) time with 
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0(k) processors using 0(k) memory. 
proofs 

Let T(k,n) be the time complexity for the random sam- 
pling problem. In case fn/kl < k at any stage of recursion, 
the algorithm par_rsamp_casel is invoked and the process 
is completed in O(log k) time with 0(k) processors. So the 

worst case is when fn/kl > k (or k 2 < n) at all stages of 

recursion (except the last), in which case every time the 
procedure par_rsamp_case2 is invoked. Therefore, the worst 
case value of the number of elements to be selected at (say) 
i stage is squareroot of the number of elements to be 

j. 1- 

selected at (i-1) stage. All steps of the procedure 
par_rsamp_case2 except the recursive step can easily be 
implemented in O(log k) time. Thus, 

T(k,n) :s T(sqrt(k),k) + O(log k) 

or, T(k,n) := O(log k) 

Since at any stage of recursion the total number of 
items being selected does not exceed k, the processor com- 
plexity of the algorithm is 0(k). 

Also, the space complexity of the algorithm is given 

by: 

S(k) := S(sqrt(k) ) + 0(k) 
or, S(k) := 0(k) 

□ 
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2.3. ANOTHER APPROACH TO PARALLEL RANDOM SAMPLING 

In this section, an algorithm par_random.j3ample2 for 
selecting k items at random out of n distinct items is 
presented which uses an approach different from the divide 
and conquer approach of the algorithm par_random_sample of 
last section. 

GOODMAN & HEDETNIEMI [GH77] have presented algorithm 
select with a time complexity of 0(n) and space complexity 
0(n) . The algorithm, first initializes an array ITEM[l:n] 
with the integers 1 through n and a pointer ’ last ’ to the 
last element i.e. n. Next, the algorithm proceeds in k 
steps . At each step an item is selected at random from 
ITEM[ 1 : last] , the item is replaced by the element pointed to 
by ’last’, and the ’last’ pointer is decremented by 1. The 

algorithm has been modified by ERNVALL & NEVALAINEN [EN82] 

2 

such that their algorithm has a time complexity of 0(R ) and 
space complexity O(k). TEUHOLA & NEVALAINEN [TN82] made 
further improvements to the space complexity. Their algo- 
rithm uses a chain of substitutions to get the indices of 
the sample . The algorithm uses a linear search technique and 
hence 0(k 2 ) computations. GUPTA & BHATTACHARJEE [GB84] use 
sorting searching and interchange methods to give an algo- 
rithm with time complexity O(k.log k). The algorithm needs 
two passes of sort, search and interchange. 

The algorithm isel which is due to GUPTA & BHATTA- 
CRARJIS manipulates a set A =j 
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{{SELECTEi] , LASTEi] ,STAGE[i]) \ i: =1, . . ,k} .where initially 
SELECTEi] - contains a random integer RAND[1 ,n-i+l] . 
LASTEi] - contains the index of the last item (n~i+l). 
STAGE Ei] “ contains i. 


The output of the algorithm is SELECTEI :k] which con- 
tains the random sample of k out of n items. It has been 
shown in [GB84] that the algorithm isel generates the same 
sequence of items as algorithm select. 


proc isel ( k,n ; SELECT ); 
var 

/* input */ 
k.n : integer; 

/* output */ 

SELECT : array [1. .k] of 1. .n; 

/♦ local ^ / 

LAST, STAGE : array [l..k] of l..n; 
i,i : integer 

begin 

11 : for i := 1 to k do 

begin 

SELECTEi] := RAND [l,n-i+l]; 

LASTEi] := n-i+1; 

STAGEEi] := i 
end; 

12 : /* sort */ 

Arrange the elements of set A such that for i < i 
either SELECTEi] > SELECTED 

or SELECTEi] = SELECTED] and 
LASTEi] > LASTED 

13 : /* search and interchange in forward direction */ 

for i : = 1 to k-1 do 

if SELECTEi] = SELECT [i+1] then 
begin 

interchange ( SELECT E i ] , LAST E i ] ) ; 
interchange ( STAGE [ i ] , STAGE [i+1] ) 
end; 

14 : /* sort */ 

Arrange the elements of set A such that for i < i 
either SELECTEi] > SELECTEi] 

. or SELECTEi] = SELECTEi] and 
LASTEi] > LASTEi] 
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15 ; /* search and interchange in backward direction */ 

for i := 1 to k-1 do 
begin 

5 := k-i+1; 

find 1, 1 in [l,k~i], SELECTED] = SELECTCl] 
if found then 
, begin 

interchange ( SELECT [ 1 ] , LAST [ 1 ] ) ; 
interchange ( STAGE [ 5 3 , STAGE [X] ) 

end 

end 

16 : Sort the elements of A such that for i < j 

STAGE [i] < STAGE [j] 

end; 


The algorithm par_random_sample2 is based upon an 
interesting observation, which is stated in the following 
lemma, in GUPTA & BHATTACHARJEE ’ s algorithm isel. 

LEMMA 2.7 


After steps 11-13 of algorithm isel, the sample 
SELECT [i], 1 < i k, is such that each item occurs at most 

twice in the sample. 
proof t 


In step II, SELECTEi] is set to RAND[ l,n-i+l ] and 
LASTCi] to n-i+1. Thus each element of LAST[l:k] is distinct 
from the other. If any item has more than one occurence in 
SELECT, after the sorting step all the occurences are 
brought together. In step 13, each repeated occurence of any 
item is replaced by the corresponding element of LAST. There 
is at most one occurence of any SELECTEi] in LAST. There- 
fore, after the interchange of step 13, each element of 
array SELECT may he found to repeat at most once . 


□ 



The above lemma suggests an algorithm which can solve 
the random sampling problem in two passes of steps Ii-13. In 
the first pass at least k/2 (and at most k) items are 
selected for the sample. Out of the remaining items, 
selecting another sample of size which is twice the number 
of items required, results in the remaining items. Combining 
the results of the two passes we get a random sample of size 
k out of n items. It is assumed that the sample generated by 
the first pass is withdrawn before starting the second pass 
(In the actual implementation of the second pass, one may 
first generate a sample consisting of the ranks of the 
items of the second sample in the initial set of n items 
from which the first sample has been withdrawn. Later, the 
second sample can be generated from among the gaps between 
successive items of the first sample). This ensures that 
none of the items selected in the first pass are selected in 
the second. The second pass necessitates that (at most) k 
items be remaining after the first pass (which selects at 
least k/2 items). This imposes a limit to n i.e. n > 3k/2 to 
start with. However this is not a serious constraint because 
in case n < 3k/2, the problem of selecting k out of n items 
can be converted to the problem of rejecting n-k items out 
of n. 

Based upon the above discussion a parallel algorithm 
can be designed. The algorithm involves the following major 
steps : 

1. Generation of the first sample consisting of at 
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least k/2 items in pass 1 . 

2. Generation of the ranks of the items to be selected 
for the second sample under the assumption that the 
fir,st sample has already been withdrawn. 

3. Generation of actual items present in the second 
sample from the ranks generated in step 2. 

4. In case n < 3k/2 at the start, finding the comple- 
ment set of the sample generated (since in this case 
the sample generated has to be rejected). 

A parallel implementation of the above algorithm is 
presented next: 

The main algorithm is presented in procedure 
par_random_sample2 . This in turn calls the procedures 

par_rsamp2_passl , par_rsamp2_pass2 , par_f ind__sai»ple2 , 

par_ complement , which correspond to the steps 1-4 mentioned 
above. The procedure par_rsamp2_pass 1 generates the sam- 
ple of the first pass SAMF1[1: npassl 3 where ’npassl’ is the 
number of distinct items selected by pass 1. It in turn 
calls the procedure par_compress , presented in chapter 1, 
to compresses the distinct elements of SAMP1 [1 : k] into 
SAMP1 [U npassl 3 . The second pass of the algorithm may pro- 
duce more than the required number of items. This condition 
is taken care of by the procedure par_f ind_endpt inside 
p»rj rwwp 2 p aas2 . The procedure par_findLs*wple2 calls 
the procedure par;jCind_.actindx which, for each element of 
the second sample, does a binary search over the gaps 
between the successive items of the sample generated by the 
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first- pass to find the actual item corresponding to it . 


proc par_rsa»p2_pass 1 ( p,k ; k,n ; SAMP1, npassl ); 

var 

/* input */ 
k , n • integer ; 

/* output */ 

SAMP1 : array [1.. npassl] of 1. .n; 
npassl : integer; 

/# local */ 
i,u : integer; 

SELECT, LAST s array [1. .k] of l..n 
begin 

II: /* initialise */ 

for i : = p to p+k-1 pardo 
begin 

u ;= i-p+l; 

SELECT [u] := RAND [l,n-u+l]; 

LAST[u] := n-u+1 

end 

pardo; 

12: /* sort */ 

Arrange the elements of set 

A = { ( SELECT C i 3 * LAST [ i ] ) i i : = l..k} such that 
for i < j either SELECT [ i ] > SELECT [o 3 

or SELECT [ i ] = SELECT [ j ] and 
LASTCi] > LAST [d 3; 

13: for i := p to p+k-2 pardo 
begin 

u sa i-p+l; 

if SELECT [u] a SELECT [u+l] then 
SELECT [u3 ;a LASTCu]; 

SAMP1 [u] := SELECTCu] 

end 

dopar; 

14: par_compress ( p,n ; SAMPl,k,n ; SAMP1, npassl ) 
end; 


The procedure par_rsamp2_pa3s2 requires that the sam- 
ple SAMP2 generated by it should have exactly ’npass2’ item 
indices. However, steps 11-15 of the procedure may generate 
more than * npass2’ distinct items. So, the procedure 
par_find__eikdpt is run to find ' endpt ’ such that 

SAMP2£l:endpt3 yields a sample of exactly ’npassR* distinct 
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items . 

proc par_find_endpt ( p , newn ; 

SELECT , STAGE , newn , npass2 , n ; endpt ) ; 


var 

/* input */ 

SELECT : array [1.. newn 3 of 1 . . n ; 

STAGE : array [l..newn] of l.,newn; 
npass2,newn,n : integer; 

/* output */ 
endpt : integer; 

/* local */ 
i,u : integer; 

BIT : array [l..newn] of 0..1; 

FAR : array [l..newn] of l..newn 

begin 

II: Sort the elements of the set A = 

{ ( SELECT [ i 3 , STAGE [ i 3 ) ! i := l,..,newn> such that 
for i < J either SELECT [i 3 > SELECT [ j 3 

or SELECT [i] = SELECT [ j 3 and 
STAGE[i3 < STAGE [ j 3 ; 

12: for i := p+1 to p+newn-1 pardo 
begin 

u ;« i~p*l ; 

if SELECT [u 3 = SELECT [u- 13 then 
BIT[STAGS[u33 : = 0 
else 

BIT[STAGE[u33 := 1 

end 

dopar; 

BITCSTAGEE1] •= U 
13: for i : = p to p+newn-2 pardo 
FAR[i-p+13 := i-P+2 
dopar; 

FAR[newn] := nevm; 

14: for Hog newnT steps do 

for i := p to p+newn-1 pardo 
begin 

u := i-p+1; 

BIT[FAR[u33 := BIT[u] + BIT[FAR[u]3; 
FARCu] := FAR[FAR[u33 

end 
dopar ; 

15: for i : = p to p+newn-1 pardo 

if BIT [i-p+1 3 - npass2 then endpt : = i-p+1 

dopar 


end; 



The procedure par_rsamp2 t _pass2 gives the algoritb 
fox the second pass wherein SAMP2 [ 1 : npass2 3 is selected. 


proc par_rsa*p2_jpass2 ( p.np ; npass2,n ; SAMP2 ); 
var 

/* input */ 
npass2 , n : integer; 

/* output */ 

SAMP2 : array [1 . . 2*npass23 of l..n; 

/* local */ 

LAST, SELECT : array [1. .2*npass2] of l..n; 

STAGE : array [1 . . 2*npass23 of 1..2*npass2; 
i , u , newn , tnum , endpt : integer 

begin 

II: newn 2 * npass2; 

12: for i : = p to p+newn-1 pardo 
begin 

u i-p+1; 

SELECT [u] := RAND [l,n-u+l]; 

LAST [ u ] := n-u+1 

end 

dopar; 

13: Arrange elements of the set A = 

{ ( SELECT C i 3 » LAST [ i ] ) ! i := 1, . . ,k> such that 
for i < j either SELECT [i 3 > SELECT U 3 

or SELECT[i3 = SELECT [33 and 
LAST [ i ] > LAST [ j ] ; 

14: for 1 := p to p+newn-2 pardo 
begin 

u ss i-p+1; 

if SELECT Cu3 s SELECT [u+1 3 then 
SELECT [u 3 := LAST [u 3 

end 

dopar; 

15: for i '- = p to p+newn-1 pardo 
begin 

u := i-p+1; 

SAMP2[u] := SELECT [u3; 

STAGECu] := u 

end 

dopfir i 

16: par_f ind_endpt (p.newn; SELECT, STAGE, newn, npass2,n; er 
I T : par_compress ( p , endpt ; SAMP 2 , endpt , n ; SAMP2 , tnum ) ; 
end; 


The sample 8AMP2 generated by par_rsamp2_paas2 i 
step 16 of the main procedure ( par_random,sa»ple2 ) dot 
not contain the actual indices of items but their ranks i 
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t-h© items remaining after the first sample SAMP1 has been 
removed. The procedure par_find_sample2 finds the actual 
item indices for the items selected in the second pass. 


proc par_find_sa»ple2 (p, npassl ; 

npassl , npass2 , n , k , SAMP1 , SAMP2 ; SAMP2 ) ; 

var 

/* input */ 

npassl, npass2,n : integer; 

SAMP1 , SAMP2 : array [0. .k+1] of l..n; 

/* output */ 

SAMP2 : array [l..npass23 of l..n; 

/* local */ 
i , u t integer ; 

GAP : array [0. .npassl+13 of 1. .n-npassl; 

FAR : array [1 .. npassl+13 of l..npassl+l 

begin 

II: for i := p+1 to p+npassl-1 pardo 
begin 

u := i-p+1; 

GAPCu3 := SAMPl[u3 - SAMPl[u-13 
end; 

GAP [03 := 0; 

GAP [ 1 ] := SAMP1 [13 - 1; 

GAP [npassl+13 *•= N+npassl-SAMPl [npassl 3 ; 

12: for i := p to p+npassl-1 pardo 
FAR[i-p+13 := i-p+2 
dopar ; 

FAR[npassl+13 '*= npassl+1; 

13*. for } log(npassl+l) i steps do 

for i := p to p+npassl pardo 
begin 

u := i-p+1; 

GAP[FAR[u33 : = GAP[FAR[u33 + GAP[u3; 
FAR[u] := FAR[FAR[u33 

end 

dopar; 

14: par_f ind_actindx (p,npass2 ; 

end; GAP, SAMP2 , SAMP1 , npass2 , npassl , n ; SAMP2) ; 


The procedure par_f ind_act indx , for each item index 
in SAMP2, does a binary search over the gaps present between 
successive elements in the sorted SAMP1 to find the actual 
index corresponding to it. 
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proc par_f ind_actindx ( p , npass2 ; GAP , SAMP2 , SAMP1 , 

npass2 , npassl , n ; SAMP2 ) ; 

var 

/* input */ 

GAP : array [0. .k] of l..n; 

SAMP1 : array [l..k+l] of l..n; 

SAMP2 s array Cl..npass2] of 1. .n; 
npass2 , npassl , n : integer; 

/* output */ 

SAMP2 : array [ 1 . . k] of 1 . . n ; 

/* local */ 

LO.UF.MI : array [l..npass2] of l..npassl+l 
begin 

for i := p to p+npass2-l pardo 
begin 

u : = i-p+1; 

LO[u] := 1; 

OF[u] • = npassl+1; 
while LO[u] < UP[u] do 
begin 

MIEu] := L (Ij0[u 3 + Up[u])/2j; 
if ( GAP[MI[u3 ] > SAMP2[u] ) and 
( GAPEMI[u]-l] < SAMP2[u] ) then 
begin 

SAMP2[u] := SAMPICMICu]] - 

(GAP[MI[u]]-SAMP2[u3 ) - 1; 
return 

end 

else 

if SAMP2[u] > GAPEMIEu]] then 
LOEu] : = MIEu] + 1 
else 

UPEu] := MIEu] - 1 

end 

end 

dopar 

end; 


In case n < 3.k/2, the problem of selection is con 
verted to the problem of rejecting n-k elements out of n. I: 
such a situation, at the end of the algorithm, the procedur 
par_find_oomple»ent f inds the complement of the sampL 
generated with respect to the total population. It is to b 
noted that whenever this procedure is run 0(n) = 0(k) . 
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proc par_f ind_cowple»ent (p, n; SAMPLE , k , n ; SAMPLE); 
var 

/* input */ 

SAMPLE : array [l..n-k] of l..n; 
k,n : integer; 

/* output */ 

SAMPLE : array [l..n-k] of l..n; 

/* local */ 

NSAMPLE : array [l..n] of l..n; 

BIT : array [l..n] of 0..1; 
i , u : integer 

begin 

II: for i : = p to p+n-1 pardo 
begin 

u := i-p+1; 

BITCu] := 1; 

NSAMPLE [u 3 := u 

end 
dopar ; 

I2'- for i : = p to p+k-1 pardo 
BIT [SAMPLE [i-p+1 ] ] := 0 
dopar ; 

j ^ * So^rt tJNe set 

B := {(BIT[i3,NSAMPLE[i3) ! i -= l,..,n} such that 
for i < j , BIT[i3 >; BITCJ3; 

14: for i := p to p+n-k-1 pardo 
begin 

u :s i-p+1; 

SAMPLE [u3 := NSAMPLE[u3 

end 

dopar 

end; 


The procedure par_random_sample2 presents the algo- 
rithm for selecting a random sample SAMPLE[l:k3 of size I 
out of n distinct items. 


proc par_^andom_.sample2 (p,k ; k,n ; SAMPLE ); 
var 

/* input */ 
k,n : integer; 

/* output */ 

SAMPLE : array fi-.fel of l..n; 

/* local */ 

■ i , npassl ,npass2 , flag t ■ integer; 

SAMPl.SAMpS : array [l.,k3 ©f l..n 
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begin 

II: if n < (3*k)/2 then 
begin 

flag := l; 
k ■•= n-k 

■end 

else 

flag := 0; 

12: par_rsamp2_pass 1 (p,k ; k,n ; SAMP1, npassl ) ; 

13: n : = n-npassl; 

14: npass2 : = k-npassl ; 

15: if npass2 > 0 then 
begin 

16: par..rsamp2_pass2 (p,2*npass2 ; npass2,n ;SAMP2); 

17: par_f ind_sample2 (p, npassl; 

npassl , npass2 , n, SAMP1 , SAMP2 ; SAMP2 ) 

end; 

18: for i := p to p+npassl-1 pardo 
SAMPLE [i-p+1 3 := SAMPl[i-p+l] 
dopar ; 

19: for i : = p to p+npass2~l pardo 

SAMPLE [npassl+i-p+1 ] := SAMP2[i-p+l] 
dopar 

110: if flag = 1 then 

par_f ind_complement(p, n; SAMPLE, k,n; SAMPLE) 

end; 

LEMMA 2.8 


All the items selected for the sample by the procedure 
par_random_sa«ple2 are distinct and lie between 1 and n. 
proofs 


Pass 1 of the algorithm ( par_r s amp2_pas s 1 ) is 
essentially the same as steps 11-13 of algorithm isel, and 
thus by LEMMA 2.1 it generates a sample (SAMP1) of size 
'npassl' (k/2 < npassl < k) such that each index selected is 
distinct. The remaining (k-npassl) items are selected in 
Pass 2 ( par.j:sams>2_pass2 ) by generating 2 * (k-npassl) 
random Integers in the range Cl > n-npassl 3 . In case more than 
k-npassl distinct integers are generated, we consider only 
the first *endpt' processors ( i. e . those numbered l.,endpt) 
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such that- "the random integers generated by these form a sam- 
ple of exactly (k-npassl) distinct item indices. Thus, after 
the two passes are over we have two samples of sizes npassl 
and (k-npassl) respectively such that each contains distinct 
item indices. Next, the two samples (stored in sorted order 
in SAMP1 and SAMP2 respectively) are combined as follows : 


The assumption is that SAMP1 is withdrawn before SAMP 2 
i.e. SAMP2 [13 gives the position of the i th item of the 
second sample after the first sample has been withdrawn from 
the pool of n items. The actual item index corresponding to 
each SAMP2[i] is found as follows: Gaps between successive 
elements of SAMP1 are found. By Binary search the actual 
gap in which each SAMP2[ i3 lies (and hence the actual index 
corresponding to SAMP2 £ i 3 ) is found. Since, the indices thus 
selected can lie only in the gaps of SAMP1, they are dis- 
tinct from the sample items in SAMP1. Thus in the combined 
sample, each index is different from the other. 

Also, the maximum value of any SAMP2[i3 is n-npassl . 
Since the size of SAMP1 is ’npassl’ , the maximum index 
corresponding to any SAMP2 £ i 3 is n-npassl+npassl •- n. The 
minimum value of any SAMP2 [ i ] is 1 and thus the minimum item 
index corresponding to it is 1+0 : = 1 * 

E3 


LEMMA 2.0 


The algorithm par_random_sampl©2 
time using 0(k> processors. 


requires O(log k) 
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proofs 

Each of the steps of the procedures 
par_rando^8aiaple2 , par_rsamp2_passl , par_rsamp2_pass2, 
pauf—coapress , par_find_endpt, par_find_actindx 

par_f ind_co«plement , par_f ind_sample2 may be executed in at 
most O(log k) time with 0{k) processors. For parallel sort- 
ing may use the algorithm in [C086], which run in O(log k) 

% 

time with 0(k) processors. 

\ 

□ 

2.4. RANDOM PERMUTATION IN PARALLEL 

With the help of the random sampling algorithm one can 
easily design an algorithm for getting a Random Permutation 
of n items. The problem is defined as follows : 

Given a sequence of n items A = (a 1 a 2 . . a R ), gen- 

erate a permutation of the n items such that the position of 
occurence of each item is random. Without loss of generality 
one may consider the n items to be the integers l..n. 

One may generate a random permutation by generating the, 

k th permutation in a lexicographic ordering, such that k is 

chosen at random. However, this process is inefficient since 

* 

the best known sequential algorithm takes 0{nlog n) time 
[GB833 . One may generate a random permutation in 0(n) time 
on a serial computer by a sequence of n interchanges such 
that the i th interchange swaps A[il and A til .where i is 
some index selected at random to the right of i [NW753. 
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proc randjperm (A, n; A) ; 
var 

/* input */ 
n : integer; 

A *. array £l..n3 of l..n; 

/* output */ 

A : array Cl..n] of l..n; 

/* local */ 
i . r , temp : integer ; 

begin 

II : for i : = 1 to n do 
begin 

r : = RAND [0,1]; /* r is a random variate 

in the the open interval 

( 0 , 1 ) */ 

l : =s 1 + tf - (n+l-i )J ; 
temp := A[l] ; 

AU3 *.= ACi]; 

A[i] := temp 
end; 


The above approach however is not suitable for parallel 
computation because we may get a chain of interchanges which 
have to be done sequentially. But given the results of the 
last two sections, one may generate a random permutation in 
poly log time using procedure par_random_perm (A,l,n) . The 
algorithm proceeds by routing n/2 randomly selected items 
into the first half of the array A and putting the remaining 
items into the second half of A. The algorithm is then 
called recursively for both the halves of A simultaneously. 
In case n is odd, one item selected at random is made to be 
the last element. 


proc parjraados^en (p,J~i+l; A,i, 4; A); 
var 

/* input */ 

i„4 : Integer; 

A : array £l,,n3 of 1. .n; 

/* output */ 

A « array £l..n3 of l..n; 

/* local */ 
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A1.A2 : array [l..n] of l..n; 
begin 

Il'< if (j-i+1) is odd then 
begin 

term : = RAND[i, J] ; 
temp := A [term] ; 

A [term] := A[j]; 

A[j] ;= temp; 

5 := 5-1 

end 

else 

begin 

k : = ti j-i+l)/2.0j ; 

par__r andom_sampl e (p,k; k, j-i+l,i, j; Al); 

A2 *•= A-Al ; /* A2 is an array containing those 
elements of A which are not in Al */ 
for » := p to p+k-i pardo 
begin 

u := ra-p; 

A[i+u] : = Al[u] ; 

A[k+l+u] := A2[u] 
end; 

parbegin 

par_random_perm(p,k ; A,i,k ; A); 
par_random_perm(p ) j-k; A,k+1, j ; A) 
parend 

end 


LBMKA 2.10 

O 

The procedure par_random_perm takes 0( (log n) ) time 
to oompute a random permutation of (1, . .n) using 0(n) pro- 
cessors . 


12 : 

13: 

14: 

15 : 


16: 


end; 


proofs 

Steps 12,13 take O(log n) time using 0{n) processors, 
while step II takes constant time. Step 14 constitutes the 
recursive step. Thus, the time complexity of the algorithm 
is given by: 

T(n) : » T<**/2) ♦ OUo* ») 
or, T(n) : ' .5 * - 
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2.S. CONCLUSION 

In this chapter two parallel algorithms for generating 
a Random Sample of size k out of a population of size n are 
presented. Both the algorithms, algorithm par_randoHL_sample 
and par_random_3ample2 proceed in 0( log k) time when 0{k) 
processors are available. The cost (product of time and pro- 
cessor complexities) Is same as the best known sequential 
algorithm using 0(k) space which runs in O(k.log k) time. 

The random permutation of k objects can be generated in 
0( (log k) 2 ) time with 0(k) processors using procedure 
pa r_rando«L_perm . However, if a random permutation of k out 
of n objects is required, one can first generate a random 
sample of size k out of n. 



CHAPTER 3 

GENERATION OF RANDOM ON LABE LED ROOTED TREE & 

RANDOM BINARY TREE 

3.1. I NTRODOCT ION 

The problem of generation of trees, as combinatorial 
objects, has received substantial attention in literature. 
Trees may be rooted or unrooted, labeled or unlabeled. 
Rooted trees may be ordered or unordered. There are other 
restricted classes of trees such as binary, k-ary etc. Con- 
siderable work has been done in generating trees under vari- 
ous coding schemes [R78] , [ZR79] , [Z80] . Since there may be 
an exponential number of trees belonging to any particular 
class, a natural question that arises is how we can generate 
a random tree belonging to any such class. If the exact 
number of trees in any class are known then selection of a 
random tree can be done via a random number generator which 
generates a random number between 1 and |T(n)}> where tT{n) I 
is the total number of trees in the given class. The tree 
corresponding to the number can then be found in the given 
class of trees. This process of generating a tree 
corresponding to a rank in some ordering of trees is termed 
as unrmnttii*9* But, the unranking procedures are usually 
inefficient. So novel procedures need to be developed which 
generate a \ random tree of a particular class. In this 
chapter we deal with two such classes of trees, the unla 
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beled rooted trees and binary trees. 

A parallel algorithm par_ran_tree for the problem of 
generating a random unlabeled rooted tree having n nodes 
with number of levels (see section 3.2) constrained by some 
l 5 * 1 < k i, a , an d which runs in O(log k) time with 0(n) 

processors is presented in section 3.2. In section 3.3, two 
optimal sequential algorithms ran_bintree 1 and 

raajbintree2 are presented for the problem of generating a 
random binary tree having n nodes. While ran_bintreel gen- 
erates a binary tree with an unconstrained number of levels, 
ran_bintree2 generates a tree with number of levels 
bounded by some k, flog nl < k < n. 

3.2 PARALLEL ALGORITHM FOR GENERATION OF RANDOM UNLABELED 
RQOTED T RI ER 

In this section, a parallel algorithm is presented for 
g e nerati ng a random unlabeled rooted tree having n nodes and 
k levels. The level ot a node is defined to be equal to 
one more than the length of the path from the node to the 
root of the tree. The number o 1 levels in a tree may be 
defined to be the level of the node having the largest level 
in the tree. The algorithm can be extended to work for the 
case of random labeled tree by taking a random permutation 
of the node numbers. . The term un labeled rooted tree is 
used because the algorithm does not distinguish between 
trees which are isomorphic but have a different labeling for 


the n nodes. 
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3.2.1 RESULTS CONCERNING SEQUENTIAL GENERATION 

A sequential algorithm due to Nijenhuis & Wilf [NW75J 
exists for the above problem. The algorithm runs as follows: 

stepl : Select an integer ro , 1 < m n, at random. 

step2: Select a divisor d of m, at random. 

step3 : Generate a random unlabeled tree T having n-m 
nodes . 

step4 : Generate a random unlabeled tree T having m 
nodes . 

> $ 

step5 : Make j : = m/d copies of T 

»tep6: Join t*h© root, of T to the roots of each of the 

i » 

copies of T 

Steps 3 & 4 constitute the recursive calls to the pro- 
cedure. In the worst case, the depth of recursion is 0(n) 
(when a = n-1 at each stage of recursion). Thus, at least 
in its present form, this algorithm cannot be efficiently 
parallelised in a straightforward manner. Also, if one wants 
a tree with k levels one may have to run this algorithm more 
than once. However, the parallel algorithm of section 3.2.2 
directly generates a tree with k levels. 

Similarly, a sequential algorithm exists for generating 
a random labeled tree which is a direct implementation of 
Prufer's proof of Cayley’s theorem [NW753. However, it seems 
that this also cannot be 'efficiently' parallelised. 
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3.2.2 ALGORITHM par_ran_tree 

The algorithm presented here makes use of the parallel 
random sampling algorithm of the previous chapter to parti- 
tlon the n nodes into k parts, at random. The first part of 
the partition is set equal to 1 since, at the first level 
there should be only one node which is the root of the tree. 
Besides, each part must be greater than 0, otherwise the 
number of levels in the tree becomes less than k. So, ini- 
tially one node is assigned to each of the remaining k-1 
parts. For partitioning the remaining n-k nodes into k-1 
parts the algorithm proceeds as follows: Let us assume that 
we have n-2 in unary form (i.e. a3 a sequence of n-2 ones). 
Out of these we select k-2 l’s at random (using 
par_random_aample2 ) • These act as the boundaries for the 
k-1 parts into which n-k has to be divided. The number of 

JL%* 

1 ’ s between the l™ 1 boundary and the { i+1 ) boundary is the 

4* Ifl 

size of the (i+2) part. It is assumed that the second and 

4» W 

the k parts are delimited on their left and right respec- 
tively by boundaries situated at positions 0 and n-1 respec- 
tively. Thus we conclude as follows: 

.|ggy»J. M A <g 

JuuEyCJKl**. 

The n nodes can be partitioned into k parts (each part 
greater than 0 and the first part equal to 1) , such that the 
i’ fcii part corresponds to the number of nodes at the i^ level 
of the tree.. 


n 
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LEMMA 3.1 is implemented by the procedure 
par-partition. This procedure in turn cells the procedure 

par_raadoausample2 presented in the last chapter for find 
ing a random sample. 


proc par__partition ( p,n ; n , k ; PART ) ; 
var 

/* input */ 
n,k : integer; 

/* output */ 

PART : array [1. .k] of 1. .n-k+1; 

/* local */ 

BOUND : array [1. .k-2] of l..n-2; 
u , i : integer 


begin 

II: par_random„oample2 { p,k-2 ; k-2,n-2 ; BOUND ) ; 
12: for i := p to p+k-1 pardo 
begin 

u := 1-p+l ; 

13: if u = 1 then 

begin 

PART[u] := 1; 

PARTSUMEu] := 1 

end 

else 

if u « 2 then 

Ini vs 

PARTCu] := B0UNDE1]; 

PARTSUMEu] := BOUND [1] +1 
end 
else 

if u = k then 
begin 

PARTCu] := n+k-3 - BOUND Ek-2] + 1; 
PARTSUMEu] := n; 


end 

dopar 

end; 


else 

begin 

PARTCu] •• = 
PARTSUMEu] 
end 


BOUNDCu-13 - BOUND Cu- 2] 
:= BOUNDCu-1] + 1 



+ 1; 


After obtaining the number of nodes at each level i » 
every node at 1 th level randomly selects a node in the 
level as its parent. 
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LUMA 3.2 

If, for each node tn the i t * 1 part of the partition 
generated by procedure partition, a node in the (i~l) th 
level is selected at random as its PARENT, then an unlabeled 
rooted tree is generated. 
proofs 


The PARENT relation, along with the specification of 
the root node completely defines the rooted unlabeled tree. 

[] 

The procedure par_ran_tree implements LEMMA 3.2. i.e. 
corresponding to each node at level i, 1 < i < k, it finds 
and fixes, at random, another node at level i-1 as the 
parent of the former node. 


proo par_ra»L.tree ( p,n ; n,k ; PARENT, root ); 
var 

/# input */ 
n.k : integer; 

/* output */ 
root : integer; 

PARENT s array [1. .n3 of l..n; 

/* local */ 

PART : array [l..k] of n-k+1; 

PARTSUM : array [l..k] of l..n; 
i, J.u.v : Integer 
begin 

II*. par_partition ( p,n ; n,k ; PART, PARTSUM ); 

PARTSUM [03 := 0; 

12: for i := P+1 to p+k-1 pardo 
begin 

u i-p+1; 

for J := p + PARTSUM [u-1] to 

p + PARTSUM=PART [ u ] - 1 pardo 

begin 

v : = j-p+1; 

PARENT! v 3 RAND£ PARTSUM[u-2]+l , 

PARTSUM £u- 13 3 

end 

dopar 

end 
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dopar ; 

I 3 ; root : = 1 ; 

PARENT [root] : = root 

end, 

LEMMA 3.3 

The tree generated by the procedure par_ran_tree is 
random . 

proof g 

Since the initial division of n nodes among k levels 
and the subsequent selection of PARENT relation for each 
node is done at random, the tree generated is random. 

C 3 

LEMMA 3.4 

Procedure par_ran_tree generates a random unlabeled 
rooted tree having n nodes and k levels in 0{log k) time 
with 0(n) processors. 

proofs 

The procedure par_partition takes O(log k) time with 
0{ n ) processors for the procedure par_random_sample2 . All 
other steps can be completed in constant time with 0(n) pro- 
cessors. 

11 

The above parallel algorithm suggests a new sequential 
algorithm which is optimal in time and space complexities, 
both of which are 0<n) . The algorithm may use the sequential 
algorithm select EGH77J presented in chapter 2 for divid- 
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ing n into k parts. The selection of a parent for each node 
can easily be completed in 0(n) time. 

COROLLARY 

A Random labeled rooted tree with n nodes and k levels 

o 

can be generated in O((log n) ) time with Q(n) processors. 

C3 


3.3. 


RANDOM BINARY TREE 




RATION 


In this section, two sequential algorithms 
ranjbintreel and ran_bintree2 for generating a binary 
tree with n nodes are presented. The algorithm r an_b int r ee 1 
generates a binary tree with an unconstrained number of 
levels, while the algorithm ran_bi*rtree2 generates a tree 
whose number of levels is constrained by some k, where 
flog nl < k < n. Both the algorithms are optimal in their 
tine and space complexities, which are 0{n). 

3.3.1 ALGORITHM ran_bintreel 

This algorithm builds up the binary tree level by level 
till all the nodes are exhausted. For each level, the algo- 
rithm uses procedure select to select, at random, the 
nodes which have left sons or right sons at the next level . 

proc *• SAMPLE' sLiv •• •••. 

var 

/* input */ 

k,n : integer - 

/* output */ ■ 

SAMPLE : array £1- -3*3 of 

/* local */ - 

ITEM : array £1. .»] of 1. •»; 
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i, 3 , last : Integer 
begin 

for i : = 1 to n do 
ITEMUJ • = 1; 
last := n; 
for i ; = 1 to k do 
begin 

a : = RAND [ 1 , last ] ; 
SAMPLECi] ;= ITEM[s]; 
ITEMfs] := ITEM[last] ; 
last := last-1 

end 

end; 

UEMHA 3.6 


The procedure select generates a random sample of k 
out of n items in 0(n) time using 0(n) space. 


proof s 


Refer to CGH77], 

[3 


Let the number of nodes at the i^ 1 level is given by 
NUMJiODECi] . Assume that we have built the binary tree upto 
the level and we want to build the (i+l) tix level. The 

choice of NUM__NODEti+l] depends upon the following con- 
straints : 

1. NUMLNODICi+13 belongs to the interval 
[l,n-&N0MJ10DEm3 

2. 1 5 NDMJNCWD®Ci + 13 < 2*NDHJ*0DEC1] , 

After selecting NCW_N0DE[i+13 at random satisfying the 
above ooaatrainta, the ISON and RSON relations for the i 
level have to be decided. The HOM_HODECi+n codes are 
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divided into two parts L[i+1] and R[i+1] , the number o 
nodes which form the left sons and those which form th 
right sons of the nodes at the level respectively. Th 

only constraint which has to be satisfied is that neithe- 
L[i + 1] nor R[i+1] should be greater than NUM_NODE[i]. Having 
selected L[i+1] and R[i+1], the random sampling algorithm 
select is used to select L[i+13 and R[i+1] nodes respec- 
tively (stored in LSELECT and RSELECT respectively) from 
HUM_NODE[ij. If some node is selected as one of the b[i+13, 
it has a left son in the tree. The case of nodes having 
right sons is similar. The process is continued till the 
number of nodes remaining for the next level is reduced to 
zero. The process ensures that every node except the root is 
either the left son or the right son of some other node. 
Besides, each node can have at most two children. 


The procedure ranjbintreel described below is lust a 
straightforward implementation of the technique described 
above for generation of a random binary tree. 


proc raa^bintreel ( n ; root, LSON, RSON ); 
var 

/* input */ 
n: integer; 

/* output */ 
root : integer; 

LSOH'RSON : array [l..n3 of 
/* local */ 

level , tnode , tempo , i ; r1 , _. 

L , R , LSAMPLE , RSAHPLE , NOM_NODE : array £ 1 . . n] of l..n, 


begin 

II: for i := 1 to n do 
begin 

LSONCH — 
RSQSCiJ **= 0 
end; 
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-un<ir\^u 

I I. T., KANPUR 


Acc. No. A1A4JL3X 


level : - l; tnode := 1; root 1; tempn : = n-1 ; 
NUM_NODE[0] := 0; NUM_N0DE[1] =1; 

12 while teropn > 0 do 
begin 

level ; = level + 1; 

NUM_NODE [ 1 eve 1 3 := RAND [ 1, 

lain (2 * NUM_NODE[ level -1 3 , tempn)]; 
L[ level] := RAND [l,min ( NUMJNODE [ level ] , 

NUM_NODE [ level- 1 ] ) 3 ; 
R[ level] := NUM_NODE[ level] - LE level]; 

13: select (L( level), NUM_NODE[ level-1] , LSAMPLE) ; 

14: select (R( level), NUM_NODE[ level-1 ] , RSAMPLE) ; 

15: for i := 1 to Lflevel] do 

LSON [ NUM_NODE [ level -2] +LSAMPLE [ i 3 ] : = 

tnode + i; 

16: for i := 1 to R[ level 3 do 

RSON [ NUH_NODE [level-2] +RS AMPLE [ i ] ] := 

tnode + L[ level] + i; 
tnode := tnode + NUM_NODE [ level ] ; 
tempn := tempn - NUM_NODE [ level ] 

end 


end; 


UBHMA 3.6 


The binary tree generated by algorithm ran_bintreel 
is random. 


proof t 

The number of levels in the tree, the number of nodes 
at any level, and the parent of any node in the tree are all 
selected at random. The decision whether any node is to be 
the left son or the right son is also taken at random. 


n 


LEMMA 3.T 

The procedure r axt»bint r ee 1 is optimal and takes 0(n) 
time using 0<a) space. 

proof » 

If is the number of levels in the binary tree 



generated , the while loop of step 12 is executed ’tlevel’ 
times. Steps 13 and 14 take 0 (NUM_NODE[ level -lj) time. Taking 
the summation over all levels of the tree, the total time 
spent executing these steps is 0(n) . Similarly, the time 
spent in steps 15 and 16 is 0(n). All other steps take con- 
stant time. Thus the overall complexity of the algorithm is 
0(n), which is optimal since it takes at least 0(n) time to 
output the n nodes of the binary free . The space used is in 
the form of arrays LSON[l:n] and RSONClm], and the tem- 
porary stores LSAMPLE and RS AMPLE. Since at any time j LS AM- 
PLE j + ) RS AMPLE | does .not exceed 0(n) , the space complexity 
of the algorithm is 0(n) , which again is optimal. 

[] 

3.3.2 ALGORITHM ran_bintree2 

The algorithm presented in this section generates a 
random binary tree whose number of levels does not exceed 
some k, flog nl skin. However, if k is chosen at random 
between flog nl and n, a tree with random number of levels 
can be generated. 

The algorithm proceeds in 0{n) steps. At the i step, 
the i th node is inserted into the binary tree built so far, 
such that the position where the node is inserted is .’chosen 
at random. The algorithm is similar to the algorithm select 
in that it uses 0(n) extra space to choose the position 
where the new node is to fee inserted' '-at -''each' step.' The algo- 
rithm also needs to maintain the level of each node 
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inserted, so as to meet the constraint on the number of lev- 
els of the tree. Two arrays POS[l:n+l] and LRBIT[l:n+l] are 
maintained, which together form the set A = 

{ (POSl 13 i LRBIT[i] ) | i : =1 , . . , last} . Each element of the set 
is an ordered pair which signifies an empty position in the 
binary tree built so far. POS[i] stores the node number and 
LRBITfi] shows whether the left son position (LRBIT[i] = 0) 
or the right son position (LRBITfiT = 1) is of POS[i] is 
vacant. If both the left son and right son positions of any 
node are vacant there are two entries corresponding to the 
node in the set. Whenever a new node ’node’ has to be 
inserted into the tree, one of the positions is chosen at 
random from the set A. The ’last 1 entry of the set is made 
to replace the chosen entry. The level of ’node’ is made 
equal to one more than that of its parent. If the level has 
reached the predefined maximum, no new entries are made into 
the set A. However, when this is not the case, two elements 
(node, 0) and (node, 1) are added to A. 

LUCIA 3.8 

The procedure described above and implemented in proc 
ran_bintre©2 generates a binary tree with n nodes whose 
number of levels is bounded by some k, flog hi < k n. 

proofs 

Ever y node inserted is either the left eon Or the right 
son of sow other node. Aiso corresponding to any node there 
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may be at most two entries in the set A, one corresponding 
to the loft son position and the other to the right son 
position. Therefore each node may have at most one left son 
and one right son. Thus the structure generated is a binary 

•f Vs 

tree. If any node is inserted at the k level, the left 
and right son positions are not added to the set A of vacant 
positions and hence no node lies at the k+1 level. 


[3 


proc ranJbimtree2 ( n ; root , LSON , RSON ) ; 
var 

/* input */ 
n : integer; 

/* output */ 
root : integer; 

LSON.RSON : array [l..n] of l..n-l; 

/* local */ 

LRBIT : array [1. .n+1] of 0..1; 

POS : array [1 . .n+1] of 1. .n; 

LEVEL : array [l..n] of l..n; 
i, sal. last : integer 

begin 

for i :s 1 to n do 
begin 

LSONti] ss 0; 

RS0N[i3 := 0 
end; 

LEVELC1] — 1; 
root : = 1; 

POSCi] sa POSC23 *•= 1; 

LBBITfl] ss 0; 

LBBITE23 • = U 

last := 2; 

for i : = 2 to n do 

begin ■ 

sel ss EANDEl,last3; 
if LRBIT [sel] = 0 then 
LSOREPOSCsel]] := i 
else 

RSONCPOSEsel]] :* U 
POSCsel] s= POSflast] ; 

LRBIT £ sel 3 ss LRBIT Clast 3; 
last • s s last-1 ; 

LEVEL [i 3 s= LEVEL CPOSt sel 3 3 + 1; 
if LEVILCil < k then 



bt> - 


end 

end; 

LEMMA 3.9 


begin 

POStlaat+1] := 
LRBIT[last+l] 
LRBIT[last+2] 
last : = last + 

end 


POSClast+2] 
= 0 ; 

= l; 

2 


i; 


The binary tree generated by the procedure 
ran_bintree2 Is random. 
proof x 

Whenever a new node Is added, its position is selected 
at random from the set of vacant positions which are all 
equally likely and hence the binary tree generated is ran- 
dom. 


□ 

LEMMA 3.10 

The procedure raajblntree2 runs in optimal 0<n) time 
using 0(n) space. 

proofs 

Each insertion of a node into the binary tree generated 
thus far takes 0(1) time for a total of 0(n) time over all 
insertions. This is optimal as it -take's. 0(n) time to output 
the tree. Since a binary tree having n internal nodes has 
n+1 leaves, the else of the arrays LIBIT and fOS is at most 
n+1 . Also, the arrays LSON and RSON take 0(n) space. Thus 
the space complexity is also -optimal . \ ■ 

n 



3.4. CONCLUSION 


In section 2, a parallel algorithm was presented for 

generating a random unlabeled rooted tree having n nodes and 

k levels (l<k<n) in O(log k) time with 0(n) processors. The 

result was generalized to the case of random labeled rooted 

2 

tree, which could be computed in O((log n) ) time using 0(n) 
processors. In section 2, two optimal sequential algorithms 
for generating a random binary tree with n nodes was 
presented. Out of the two, algorithm ran_bintree2 was able 
to generate a binary tree with at most k levels. Both the 
algorithms ran in 0(n) time and used 0(n) space. The idea 
used in the case of unlabeled rooted trees cannot be gen- 
eralised to binary trees since they have the constraint that 
the number of nodes at the i^ level are at most twice the 
number of nodes at (i-l) th level. Therefore the (l+l)^ h 
level cannot be generated before the 1^ and this makes the 
process sequential. 
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CHAPTER 4 

ATIOH OF BINARY TREES 


4.1. ItfTRODOCTION 

Binary tr®e » as a data structure, appears extensively 
in various applications in computer science. Thus, the prob- 
lem of generation of binary trees has motivated considerable 
research activity [PR80J , [LLW86]. 

In this chapter a coding scheme has been described for 
binary trees and two algorithms for generation of binary 
trees in lexicographic order on a parallel machine are pro- 
posed. The first algorithm generates the trees one after the 
other in lexicographic order. The second algorithm generates 
all trees simultaneously. Both the algorithms achieve 
optimal speedup ratios . 

In section 4.2 the theoretical background for the prob- 
lem is developed while section 4.3 is concerned with presen- 
tation of the algorithms for generation of binary trees. 

4 . 2 . DEFINITIONS AND THKHBSTICAI. BACKGBOOI© 

Let us consider a binary tree B with n. nodes. Consider 
another binary tree B’ which is generated from B by adding 
the n*i leaf nodee to it B’ may. -be defined, to be.; the 2~ary 
•form - of B. The nodes of B in B’ are the internal nodes. If 
wa la be l, the internal nodes of 8* by 1 and the leaf, nodes by 
0 and read out all the labels in breadth-first order, a 



code of size 2n + 1 is generated for the tree B. Neglecting 
the label corresponding to the rightmost leaf node, which 
will always be 0, we get a code of size 2n. An example is 
shown in the figure. 



The lexicographic order of generation of binary trees 
is the dictionary order i.e. the one in which, given the 
codes for any two binary trees , the numerically smaller tree 
is generated before the larger one. The ranking and 
uaarankiag problems for binary trees are defined as follows: 

Ranking - Given the code for a binary tree, find its 
rank in the lexicographic order of generation, 
(inranking - Given a rank k, find the ©ode of the 
binary tree corresponding to it in the lexicographic 
order of generation. 

The thras 'code of a binary tree' and ’binary tree’ 
have been used interchangeably in this chapter.. 



LEMMA 4.1 


Th© cod© for th© binary trees produced as above has th€ 
following properties: 

1. Number of 1 ’ s = Number of 0’s = n. 

2. In any prefix of th© code, the number of l’s is 

greater than or equal to the number of 0’s. 

proof: 

Th© first property is obvious since there are n inter- 
nal nodes and n+1 leaf nodes of the binary tree and we have 
neglected the last leaf node. If any prefix of the code is 
taken it corresponds to a binary tree which has not been 
fully converted to its 2-ary form (i.e. all leaf nodes have 
not been added ) . Therefore , the number of l’s in the code of 
such a tree is at least equal to the number of zeroes. 

£] 

It follows from the preceedtng lemma that the position 
of the i*'** 1 in th© code has a position of at most (2i-l ) . 
So instead of the code of size 2n, a new code of size n oan 
be created where the i th number in the code gives the posi- 
tion of the i th 1 in the original code. For example, the new 
code for the above case is (1 Z 4 5 8 9) . ., 



LEMKA 4.2 


The new code haa the following properties: 

1) The size of the code is n. 

2) The value of the 1 element of the code is at least i 

and at most 2i- 1 . 

3) The elements of the code are in increasing order, 
proof: 

Since there are n l's in the original code, the size of 
the new code ia n. Since the original code satisfies the 
prefix property i.e. the number of l’s in the prefix is 
never greater than the number of 0’s, the position of the 
iapth 1 can at most be equal to 2i-l. (3) is obvious from 
the fact that the position of the i 1 is greater than the 
position of the ( i+1)*’*' 1. 

n 

Let us consider an example and see how the binary trees 
are generated lexicographically . 

OTAIIP fJg 4.1 
Let n s 4. 

The minimum tree generated in the lexicographic, order 
is (1 2 3 4) and the maximum is (1 3 5 7) . The complete 

order of generation is as ; follows*'; 

( 1234 ) 

( 1235 ) 

(123 §) 

(123 T) 



u 

2 

4 

6) 

(1 

2 

4 

6) 

(1 

2 

4 

7) 

(1 

2 

6 

6) 

( 1 

2 

5 

7) 

(1 

3 

4 

5) 

u 
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4 

6) 

(1 

3 

4 

7) 

(1 

3 

5 

6) 

u 

3 

5 

7) 


It can easily be verified that the number of binary 
trees with n nodes is given by 

BT(n) = l/(n*l) 2a C n (ref. [Z803) 

In the above example BT{4) = 14. 


On careful observation one can clearly see a pattern in 
the lexicographically ordered codes. In fact, the codes gen- 
erated in the above example follow the pattern of the fol- 
lowing structure: 


14 trees have 
first element 1 


y 


w NfixT _ . * 

9 trees have - » 5 trees have 


second element 2 


second element 3 




4 trees have 

third element 3 
✓ 

'iASt 

* 

✓ 

1 %rmm > hag ' l 


'1*9 MT 




^ 2 trees have 
third element 5 


fourth element 4 


s 

y 

3 trees have 
third element 4 

#«**t / ~ 

/ 

tree has tree- 

fourth element 5 fourth element 6 fourth elemen 


f^iAsr ./ 

/ 




**exr 


1 ■ tree has 



Let. the above structure be stored in an array 
TRIAHGLE[l -n, l:n] . It is easy to see that the elements of 
TRIANGLE satisfy the following properties: 

1) TRIANGLEEi, 1 3 = BTCn-i-l], 

2) TRI ANGLE [n, i 3 = 1, 1 < i < n, 

3) TRIANGLEEi, j] = TRIANGLE E i+1 ,J3 + TRIANGLEEi, J+l 3 , i>j, 
l<i<n~l, l<j<n-l. 

In general, 

TRIANGLEEi, j] = (i-j+2/n-i) 2n_j ” i+1 Cn _ i _ 1 

ram 4.3 

Each row of TRIANGLE contains partial sums of the ele- 
ments of the next row. 

L-t 1 

i.e. TRIANGLEEi, j] s TRIANGLE£i+l,k] 

K*- j 

proof: 

By construction, 

TRIANGLEEi. J 3 = TRIANGLE E i+1, j 3 + TRIANGLEEi, j +13 

= TRIANGLE £1+1, j ] + TRIANGLE £ i+ 1 , j+1 3 

+ TRIANGLEEi, j +2 ] 

= TRI ANGLE £i+l,j] + + TRIANGLE £ i+1 , i- 13 

+ TRIANGLEEi,!]. 

Now, 

TRIANGLE E i , 3 3 = (i-j+2/n-i) 2n ~ j " 1+1 C n _ i _ 1 

Therefore, . 

TRIANGLE£i+l , i+1] = (2/n-i-l) 2n “ 2i " 1 C n-i _ 2 

% = 2. (2n-2i-l)i /(a-i-l)!.(n-i+l)l 



TRIANGLE [1+1 , i] = (3/n-i-l) 



= TRI ANGLE [i,i] 

Thus, 

•>« 

TRIANGLE[i , j ] s £ TRI ANGLE [i+l ,kj 

K*j 


□ 


Each ©lament of th© array TRIANGLE defines a set of 
traas having a a particular integer at a specified position 
in the code. Thus, after generating the array TRIANGLE, the 
unranking procedure i.e. generating the tree corresponding 
to a rank, becomes very obvious. The TRIANGLE is traversed 
using the FIRST and NEXT pointers to find the sets of trees 
to which the tree with a given rank belongs to, with the i 

4* In 

row of TRIANGLE contributing the i element to the code of 
the tree . 


4 . 3 . THE GENERATION ALGORITHMS 

The first algorithm, par_lex_b intree , generates all 
the binary trees with n .nodes in lexicographic order, one 
after the other. 1 The algorithm makes use of 0(n) processors 
and 0(BT(n)> time; where BT(n) gives the total humbs*? pf ; 
binary trees with n nodes. This is optimal since it takes 
0(n) time oh a serial computer to output one tree. The 
second algorithm, par_gen_bimtxee , generates all the 



binary trees simultaneously in 0( rBT(n)/PT .n) time, where P 
is the number of processors available. This again achieves 
optimal speedup ratio. 

4.8.1 ALGORITHM par_lexjbi»tree 

Let A[l:n] store the code of any binary tree. 

The first tree in the lexicographic order is the one in 
which A[i] * i, for all i. Given a binary tree (code), con- 
sider the problem of generation of the next binary tree in 
the lexicographic order. For this is needed additional 

storage in the form of an array NEXT[1 :n] . NEXTCi] stores 

l/lfl 

the index of the closest element to the left of the i u 
position in A[l:n] which has not reached its maximum (The 
maximum element at position i is 2i-l). Thus NEXTCi] = i to 
Start with. Also, the pointer ’pointr’ gives the minimum 
index from which onwards the code of the previous tree has 
be changed in order to get the new code. Obviously, the 
minimum next tree that can be generated is the one in which 
A [pointr ] := ACpointr] +1 , and for all i > pointr, A[i3 '* = 
A[i-1]+1. 

After generating the tree the array NEXT and the 
pointer ’pointr’ may be modified for subsequent generations 
as follows: 

If after incrementing, ACpointr] has reached its max- 
imum (i.e. ACpointr] = 2.pointr-l), then if pointr <> n, 
NEXT [ pointr* 1] is set to (pointr-1). This is consistent with 
the definition of the array NEXT. Since none of the elements 



with index greater than 'pointr' have reached their maximum, 
for each of them (except pointr+1) NEXT[i] may be set to 
i-1. The pointer 'pointr’ is set to n since it is the max- 
imum index which has not reached its maximum (Thus it is the 
minimum index which has to be changed in order to get the 
next tree). However, if pointr = n, ’pointr’ may simply be 
set to NEXT [ pointr ] . 


If, after incrementing, A[pointr] has not reached its 
maximum, then for all i > pointr, NEXT[i] is set to its ini- 
tial value i.e. i _ l (since none of the elements beyond 
'pointr’ have reached their maximum). 


LEMMA 4.4 


The procedure described above can be used to generate 
ail the BT(n) binary trees with n nodes, lexicographically. 


[] 


The procedure par_lex_bintree implements LEMMA 4.4 . 

proc par_lex_bintree (p,n; n) 
var 

/* input */ 
n : integer; 

/* output */ 

A : array of 1..2n-l; 

/* local */ 

i,u, count, pointr : integer; 

NEXT : array [l..n] of !..n 

begin 

II: for i := p to p+n-1 pardo 
NEXTCi-p+11 == i-P 
dopar; 



XjJpi * m§ 


The following gives the trace of generation of all 
binary trees with n nodes in lexicographic order using pro- 
cedure par„lex w _bintree . 
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4.3.2 ALGQRITEM par_gen_bintree 


Though the algorithm of last section achieves optimal 
speedup, it generates all the trees one after the other. 
Therefore, it is of interest to design an algorithm which 
can generate trees simultaneously. This is what is accom- 
plished by the algorithm par_gen_bintree proposed in this 
section. The algorithm par g en bintree generates all the 
binary trees with n nodes in 0( fBT(n)/Pl . n) time .where P is 
the number of processors used. 

The algorithm needs preprocessing in the form of . the 
array TRIANGLE as described in section 4.2. After this, the 
algorithm unranks all the binary trees simultaneously by 
traversing TRIANGLE. 

The procedure par_form_tri computes the array TRIAN- 
GLE. 

proo par_form_tri ( p, n 2 ; n; TRIANGLE ) ; 

var 

/* input */ 
n : integer; 

/* output */ 

TRIANGLE : array U.*h,l..n3 of l..BT(n); 

/* local */ 
i.j.u.v : integer; 

FACT : array [1..2nl of 1..2ni 

begin 

II: for i : = p to p+2n~l pardo 
FACT[i-p+l] := i-p+1 
dopar 



12: for j : = 0 to flog 2 hi do 

for i : = p to p+2n-l pardo 
begin 

u ss i-p+1; . 

FACTCu+2 0 ] := FACT[u+2' ) 3 * FACT [u 3 

end 
dopar ; 

13: for i := p to p+n-1 pardo 

for 4 := p to p+i-1 pardo 
begin 

u := i-p+1; 

v := j-p+1; 

TRIANGLE [u,v] := ( (u-v+2 )/(n-u) ) * 

{FACT[2n-u-v-13/(FACT[n-u-13*FACT[n-v-'23 ) ) 

end 

dopar 

dopar 

end; 


LHASA 4.6 

The procedure par_form_tri computes TRIANGLE in 
0(fn 2 /Pl+log n) time, where P i3 the number of processors, 
proof: 

The procedure par_form_tri gives the implementation 
. fear- the case when P = 0(n 2 ) . However, given P processors it 
i« trivial to prove that step 2 can be implemented in 
0( fn/Pl+log n) time. Step 13 of the algorithm takes 
0( fh 2 /P] ) time. Hence the overall complexity of the algo* 
rithm is 0(Fh 2 /P7+log n). 

C3 


The main procedure par_gen_b intree is presented next. 
The BT(n) binary trees generated are stored in 
BITREES [1 : BT(n) , 1 : n3 . 



f X/ 


proc par.jgenubintree ( P,BT(n); n; B I TREES ); 


var 


/» input */ 
n : integer ; 

/* output */ 

BITREES ; array [1. ,BT(n), 1 . .n] of l..n; 
/* local */ 

RANK : array [l..BT(n)3 of l..BT(n); 
i,j,k,u,v : integer 


begin „ 

II*. par_form_tri (Pjn' 15 ; n; TRIANGLE); 

12: for i : = p to p+BT(n)-l pardo 
begin 

u := i-p+1; 

RANK[u] := u; 

BITREES[u,13 ••= 1; 

5 ■•= i; 

13: for k := p+1 to p+n-1 do 

begin 

v := k-p+1; 

BITREES Eu,v] := BITREES[u, v-1] + 1; 
while TRIANGLEEv, j] < RANKED do 
begin 

RANKCu] := RANK[u] - TRIANGLEEv, j] 
j := j+i; 

BI TREES [u,v] := BITREESEu, v] + 1 

end 

end 


end 

end; 

f ' tnjnjth a nr 


The procedure par gen bintree generates the BT(n) 
binary trees with n nodes in 0( flBT{n)/in . n) time where P is 
the number of processors , 
proof: 

Form LEMMA 4.6, step II takes 0(fh 2 /Pl+log n> time. 
Step 12 is done in parallel for all binary trees. Each 
binary tree is generated by traversing TRIANGLE such that, 
if u and v are the two indices of the array, both u and v 
are nondecreasing. The traversal is carried out in the same 
fashion as was demonstrated in section 4.2 of this chapter. 



Thus each traversal takes 0(n) time Since P traversals can 
be done simultaneously, the complexity of step 12 is 
0( f33T(n)/Pl .n) . Hence the complexity of the algorithm 
par_g«0>intree is 0( fBT(n)/Pl .n) . 


EXAMPLE 4.3 

The present example uses procedure par g en bintree to 
generate all binary trees with n=5. 

The array TRIANGLE computed in step II is as follows: 



Let us generate the 20 ^ binary tree. The following 
gives the trace showing how BITREE [ 20, 1 : 5] is generated. 

Initialize, 

RANKC203 —20 

' " ' ’ y. , •„ ■■ , 

BITREES [ 20, 1] —1 

The elements of TRIANGLE are traversed as follows: If 
the current element being examined TRIANGLE[u, v] is less 
than RANK [20 3 then subtract TRI ANGLE [ u, v] from RANK [20] and 



go to TRIANGLE [u , v+ 1 ] . Otherwise go to TRIANGLE [u+l , v] . 


ELEMENT OF TRIANGLE 
EXAMINED 

u= v= 

RANK[20] 

BITREES[20,u] 

2 

1 

20 

© 

3 

1 

6 

3 

3 

2 

6 

© i 

4 

2 

2 

5 

4 

3 

2 

© 

5 

3 

1 

7 

5 

4 

1 

© 

Thus the 20 ttl binary 

tree is (1 2 4 

6 7). 

4.4. CONCLUSION 



In 

this chapter 

two procedures , 

par_lex_bintree 


pa r g en bintree have been presented for the algorithmic 
generation of all binary trees with n nodes, lexicographi- 
cally, on a parallel random access machine. The procedure 
IMu^lox^bintree generates all the trees in 0(BT(n)) time 
with 0<n) processors. The procedure par_gen_bintree uses 
an unranking procedure to generate all binary trees simul- 
taneously in 0( fBT(n)/Pl .n) time, P being the number of pro- 
cessors. Both the algorithms are cost optimal. In procedure 
par_gen_bintree , the problem has been parallelised only to 

the extent that all unrankings are performed simultaneously. 
However, a quick and easy parallel solution to the unranking 
problem itself seems improbable. In fact, developing more 
efficient unranking procedures is an open problem [CDC86] . 


The procedure par_lex_bintree can be generalised to 



the case of K-ary trees (i.e. those trees in which no node 
has more than K sons) by simply replacing 2i-l by K(i-l) + 1 
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